Responsibilities | SRE
- Monitor and Maintain Systems: Oversee systems and infrastructure to ensure optimal operational performance and reliability.
- On-Call Rotation: Participate in rotational on-call duties to address issues as they arise.
- Collaborate and Troubleshoot: Work closely with other SRC professionals and engineers to troubleshoot and resolve incidents, offering consultation and guidance on issues.
- Proactive Issue Prevention: Anticipate potential issues and collaborate with teams to find solutions before they impact performance.
- Analyze Metrics: Collect and analyze data from system and application logs to support performance optimization, fault detection, and troubleshooting.
- Automation & Process Improvement: Build automated solutions to streamline operations, reduce manual tasks, and improve overall system reliability.
- Collaborate with Support Teams: Partner with application, infrastructure, and operations teams to ensure smooth system management.
- Post-Incident Reviews: Participate in post-incident reviews to identify root causes and recommend improvements to prevent recurrence.
Required Knowledge, Skills, and Abilities
- Server Administration: Proficient in troubleshooting and managing Linux and Windows servers, including patching and basic scripting (PowerShell, Bash).
- Converged Solutions: Familiarity with VCE/UCP (VMware 6+), platform/network connectivity, patching, and threat remediation. PowerShell and Linux scripting knowledge is essential.
- Storage Management: Understanding of CIFS/NFS, Avamar, Data Domain, and DPA reporting in both Linux and Windows environments.
- Middleware: Experience with Linux, Windows, WebSphere, Apache, IIS, WebLogic, and Tomcat.
- Mainframe Systems: Knowledge of JCL, CICS, and SYSPLEX environments.
- Networking: Deep understanding of networking protocols, OSI Model, and Network+ certification.
- Service Management Tools: Proficiency with ServiceNow for workflow and knowledge management.
- Collaboration Tools: Experience with TrueSight, Jira, and Confluence.
- Process & ITSM: Familiarity with IT Service Management (ITSM) practices, including Lean methodology and operational performance analytics.
- Problem-Solving: Strong troubleshooting skills with the ability to analyze and resolve complex technical challenges.
- ITIL Knowledge: Familiarity with ITIL processes, including Problem Management, Change Management, Release Management, Event Management, and Incident Management.
Education / Experience
- Bachelor’s degree in Engineering, Computer Science, or related field required
- 8+ years’ of experience in an engineer role (lead)
- 2 years’ experience supporting a large enterprise center